Dna matching

ABSTRACT

A method of searching a computer database containing a plurality of stored DNA profiles is provided. The method involves generating a search profile formed of two or more allele identities for each of one or more loci, at least one of the allele identities having a limited range of values, with the search profile being compared against the one or more stored DNA profiles from a database to establish matches between the search and stored profile.

This application is a Continuation Application of U.S. Ser. No.12/161,758 filed 3 Jun. 2010, which is a National Stage Application ofPCT/GB2007/000365, filed 2 Feb. 2007, which claims benefit of Serial No.0602106.7, filed 2 Feb. 2006 in Great Britain and which applications areincorporated herein by reference. To the extent appropriate, a claim ofpriority is made to each of the above disclosed applications.

This invention concerns improvements in and relating to DNA matching,particularly, but not exclusively between a first DNA profile and one ormore stored profiles held in a database.

Existing approaches to the matching of a DNA profile to stored profilesare limited in their versatility. It is amongst the potential aims ofthe present invention to provide a more discriminating, whilst fullyencompassing, approach to DNA matching.

According to a first aspect of the invention we provide a method ofsearching a computer database containing a plurality of stored DNAprofiles, the method comprising

generating a search profile, the search profile being formed of two ormore allele identities for each of one or more loci, the alleleidentities having one of a value or a limited range of values or anyvalue, wherein at least one of the allele identities has a limited rangeof values;

accessing one or more of the stored DNA profiles from the computerdatabase; the stored DNA profiles having two or more allele identitiesfor each of one or more loci, the allele identities having one of avalue or a range of values or any value;

comparing, using a computer implemented method, the search profileagainst the one or more stored DNA profiles;

establishing that the search profile matches a stored DNA profile when,in respect of a locus, the allele identities of the search profilecorrespond to or fall within the values for the allele identities forthat locus of that stored DNA profile;

outputting a data set, the data set indicating those of the stored DNAprofiles established as matching the search profile.

The first aspect of the invention may include any of the features,options or possibilities set out elsewhere in this document.

According to a second aspect of the invention we provide a method ofsearching a database containing a plurality of stored DNA profiles, themethod comprising

generating a search profile, the search profile being formed of two ormore allele identities for each of one or more loci, the alleleidentities having one of a value or a range of values or any value,wherein at least one of the allele identities has a range of values;

accessing one or more of the stored DNA profiles from the database, thestored DNA profiles having two or more allele identities for each of oneor more loci, the allele identities having one of a value or a range ofvalues or any value;

comparing the search profile against the one or more stored DNAprofiles;

establishing that the search profile matches a stored DNA profile when,in respect of a locus, the allele identities of the search profilecorrespond to or fall within the values for the allele identities forthat locus of that stored DNA profile;

outputting a data set, the data set indicating those of the stored DNAprofiles established as matching the search profile.

The second aspect of the invention may include any of the features,options or possibilities set out elsewhere in this document, includingthe following.

The database may be a computer database. The method may includecomparing, using a computer implemented method, the search profileagainst the one or more stored DNA profiles.

According to a third aspect of the invention we provide a method ofsearching a database containing a plurality of stored profiles, themethod comprising

generating a search profile, the search profile being formed of two ormore identities for each of one or more targets, the identities havingone of a value or a range of values or any value, wherein at least oneof the identities has a range of values;

accessing one or more of the stored profiles from the database, thestored profiles having two or more identities for each of one or moretargets, the identities having one of a value or a range of values orany value;

comparing the search profile against the one or more stored profiles;

establishing that the search profile matches a stored profile when, inrespect of a target, the identities of the search profile correspond toor fall within the values for the identities for that target of thatstored profile;

outputting a data set, the data set indicating those of the storedprofiles established as matching the search profile.

The third aspect of the invention may include any of the features,options or possibilities set out elsewhere in this document, includingthe following.

The database may be a computer database. The method may includecomparing, using a computer implemented method, the search profileagainst the one or more stored DNA profiles. The stored profiles may bestored DNA profiles. The search profile identities may be alleleidentities. The search profile targets may be loci. The stored DNAprofile identities may be allele identities. The stored DNA profiletargets may be loci.

The first and/or second and/or third aspects of the present inventionmay provide from amongst the following features.

The database may contain at least 10,000 stored profiles, morepreferably contains at least 100,000 stored profiles and ideallycontains at least 1,000,000 stored profiles.

The database may include stored profiles which have two or moreidentities for each of the set of loci used in the database. Thedatabase may included stored profiles which potentially have two or moreidentities for each of at least 10 loci. The database may include storedprofiles which lack one or more of the identities for one or more loci.The database may include stored profiles which have been assigned anindication of any value being possible or have been assigned a wildcardfunction for one or more identities of one or more loci.

The search profile may comprise two or more alternative single profiles.The alternative single profiles may be separately compared against theone or more stored profiles. Preferably matches for a search profilewhich comprises two or more alternative single profiles are outputted asa single data set. Preferably the single profiles include two or more,preferably allele, identities for each of one or more targets,preferably loci. Preferably the single profiles have identities havingone of a value or any value. Preferably the presence of an identitywhich has a range of values within the search profile is provided by thetwo or more different values used for an identity between differentsingle profiles.

The search profile may be a single profile. The single profile mayinclude at least one of the allele identities having a limited range ofvalues.

The two or more identities for a target, preferably loci, may have thesame or different values. The value of one or more of the identities,preferably all having a value, may be an integer. The one or moreidentities having any value may be provided by a wildcard function. Thevalue of one or more of the identities, preferably all having a value orlimited range of values, may be expressed in terms of an allele size.The value of one or more of the identities, preferably all having avalue or limited range of values, may be expressed in terms of an alleledesignation. The identities having a limited range of values may have arange of 5 allele designations or less.

A plurality of loci may be included in the search profile. The searchprofile may include loci from one or more of D3, VWA, D16, D2, D8, D21,D18, D19, THO or FGA

The method may establish that the search profile matches a stored DNAprofile when, in respect of more than one locus, the allele identitiesof the search profile correspond to or fall within the values for theallele identities for those loci of that stored DNA profile.

The outputted data set may provide a list of stored DNA profilesestablished as matching the search profile. The outputted data set mayprovide a ranked list, with the rank being provided according to alikelihood of the match.

The method may include using the outputted data set to indicate a personand/or item and/or location which was the source of a DNA profilematching the search profile.

Partial or complete DNA profiles may be obtained in a variety of waysand from a variety of sources. They are of particular interest inforensic science. An important part of the consideration of a DNAprofile is to compare it with one or more other profiles. The comparisoncan be used to establish that there is a match, or a likelihood of amatch, between the two.

The present invention provides a method for search a DNA profiledatabase in a way which provides a balanced approach to capturingpotential matches of interest, whilst still providing significantdiscriminating power so as to avoid capturing irrelevant potentialmatches. The present invention may also allow new questions to be askedin the search of the database search, for example “find me any potentialoffspring from these alleged parents”. The invention is suitable for usein conjunction with a database featuring DNA profiles obtained by theanalysis of DNA containing samples from individuals, mixtures, crimescenes and items. The invention is suitable for use with a database suchas The National DNA Database (UK Registered Trade Mark).

The present invention allows a constrained range of values to be set forone or more of the allele identities involved in the search profile.Constraining the range ensures that all realistically possible valuesfor that identity are consider and so no potentially relevant matchesare inadvertently discarded. At the same time, the constraining of therange ensures that unrealistic values for the identities are notconsidered. Doing so could potentially throw up a very large number ofmatches which are not possible in reality.

Thus in a search profile, the possible values for the identities for thevarious loci may be as follows: D3, 15 or 16, 16 or 17; VWA 14 or 15, 14or 15; D16 14 or 15, any value; D21 14 or 15 or 18, any value; THO 15 or16, 15 or 16 or 17 or 18.

Such an approach allows a single set of results to be obtained, whilsttaking into account within the search profiles the maximum amount ofinformation available. It may be impossible to determine a known alleleabsolutely, but it still may be possible to say more than “noinformation” about it. As a result the success rate for samples where aprofile is obtained, but cannot be expressed as a single result isincreased.

The search tool can be used to make comparisons for a variety ofpurposes. Thus, referring to the values provided above, the followingpurposes may be under consideration:

-   1) The variation selected for loci D3 and VWA would be typical of    that used to investigate a search profile which was thought to be a    2 person mixture. In such a case, a match might be made based on the    specific identities for a locus independent of a match with the    specific identities of another locus of that search profile,    provided there was a match with the specific identities of the    another locus in one of the search profiles. Thus a match would    exist where D3 was 15,16 and VWA was 14,15 because this combination    was envisaged with the ranges.-   2) The variation selected for locus D16 is typical of that used to    consider a parent child relationship between search and stored    profile. Loci for which ambiguity is present often occur in such    cases.

3) The variation selected for locus D21 is typical of that consideredfor the minor alleles in a major minor profile. In such cases, the minoralleles can often be deduced, but the deductions are ambiguous. Use of awild card means that the match results have to be screened to see thatthe wildcard part of the match is viable given the observed profile.

4) The variation selected for THO is typical of the considerationsinvolved for a three person mixture.

1. A method of searching a computer database containing a plurality ofstored DNA profiles, the method comprising generating a search profile,the search profile being formed of two or more allele identities foreach of one or more loci, the allele identities having one of a value ora limited range of values or any value, wherein at least one of theallele identities has a limited range of values; accessing one or moreof the stored DNA profiles from the computer database, the stored DNAprofiles having two or more allele identities for each of one or moreloci, the allele identities having one of a value or a range of valuesor any value; comparing, using a computer implemented method, the searchprofile against the one or more stored DNA profiles; establishing thatthe search profile matches a stored DNA profile when, in respect of alocus, the allele identities of the search profile correspond to or fallwithin the values for the allele identities for that locus of thatstored DNA profile; outputting a data set, the data set indicating thoseof the stored DNA profiles established as matching the search profile.2. A method of searching a database containing a plurality of storedprofiles, the method comprising generating a search profile, the searchprofile being formed of two or more identities for each of one or moretargets, the identities having one of a value or a range of values orany value, wherein at least one of the identities has a range of values;accessing one or more of the stored profiles from the database, thestored profiles having two or more identities for each of one or moretargets, the identities having one of a value or a range of values orany value; comparing the search profile against the one or more storedprofiles; establishing that the search profile matches a stored profilewhen, in respect of a target, the identities of the search profilecorrespond to or fall within the values for the identities for thattarget of that stored profile; outputting a data set, the data setindicating those of the stored profiles established as matching thesearch profile.
 3. The method of claim 2, wherein the database is acomputer database and the method includes comparing, using a computerimplemented method, the search profile against one or more storedprofiles.
 4. The method of claim wherein the stored profiles are storedDNA profiles.
 5. The method of claim 2, in which the search profileidentities are allele identities and/or the stored DNA profileidentities are allele identities.
 6. The method of claim 2, in which thesearch profile targets are loci and/or the stored DNA profile targetsare loci.
 7. The method of claim 1, in which the search profilecomprises two or more alternative single profiles.
 8. The method ofclaim 7 in which the alternative single profiles are separately comparedagainst the one or more stored profiles.
 9. The method of claim 7 inwhich the matches for a search profile which comprises two or morealternative single profiles are outputted as a single data set.
 10. Themethod of claim 7, in which the presence of an identity which has arange of values within the search profile is provided by the two or moredifferent values used for the identity between different singleprofiles.
 11. The method of claim 1, in which the search profile is asingle profile.
 12. The method of claim 1, in Which the two or moreidentities for a target have the same or different values.
 13. Themethod of claim 1, in which the value of one or more of the identitiesis an integer.
 14. The method of claim 1, in which the value of all theidentities is an integer.
 15. The method of claim 1, in which the one ormore identities having any value are provided by a wildcard function.16. The method of claim 1, in which the value of one or more of theidentities having a value or limited range of values is expressed interms of an allele size.
 17. The method of claim 1, in which the valueof one or more of the identities having a value or limited range ofvalues is expressed in terms of an allele designation.
 18. The method ofclaim 17 in which the identities having a limited range of values have arange of 5 allele designations or less.
 19. The method of claim 1, inwhich the method establishes that the search profile matches a storedDNA profile when, in respect of more than one locus, the alleleidentities of the search profile correspond to or fall within the valuesfor the allele identities for those loci of that stored DNA profile. 20.The method of claim 1, in which the outputted data set provides a listof stored DNA profiles established as matching the search profiled 21.The method of claim 1, in which the outputted data set provides a rankedlist, with the rank being provided according to a likelihood of thematch.
 22. The method of claim 1, in which the method includes using theoutputted data set to indicate a person and/or item and/or locationwhich was the source of a DNA profile matching the search profile.